Figure 1: Illumina Sequencing Workflow
Genome sequcencing data is often stored in one of two formats, FASTA and FASTQ text files. For example a FASTA files looks like the following:
Figure 2: FASTA file format
We can also store confidence or quality scores using a FASTQ format:
In order to translate FASTQ quality scores:
Figure 4: Fastq Encoding
And now converting to confidence probabilities:
Figure 5: Fastq encoding quality scores
FastQC (http://www.bioinformatics.babraham.ac.uk/projects/fastqc/) provides a simple way to do QC checks on raw sequence data:
To rund FastQC you can launch the GUI app, or run form the command line:
./FastQC/fastqc reads/R01_10_short500K.fq.gz
## Started analysis of R01_10_short500K.fq.gz
## Approx 5% complete for R01_10_short500K.fq.gz
## Approx 10% complete for R01_10_short500K.fq.gz
## Approx 15% complete for R01_10_short500K.fq.gz
## Approx 20% complete for R01_10_short500K.fq.gz
## Approx 25% complete for R01_10_short500K.fq.gz
## Approx 30% complete for R01_10_short500K.fq.gz
## Approx 35% complete for R01_10_short500K.fq.gz
## Approx 40% complete for R01_10_short500K.fq.gz
## Approx 45% complete for R01_10_short500K.fq.gz
## Approx 50% complete for R01_10_short500K.fq.gz
## Approx 55% complete for R01_10_short500K.fq.gz
## Approx 60% complete for R01_10_short500K.fq.gz
## Approx 65% complete for R01_10_short500K.fq.gz
## Approx 70% complete for R01_10_short500K.fq.gz
## Approx 75% complete for R01_10_short500K.fq.gz
## Approx 80% complete for R01_10_short500K.fq.gz
## Approx 85% complete for R01_10_short500K.fq.gz
## Approx 90% complete for R01_10_short500K.fq.gz
## Approx 95% complete for R01_10_short500K.fq.gz
## Analysis complete for R01_10_short500K.fq.gz
Figure 5: Fastqc scores
Figure 6: Fastqc score distribution
Figure 7: Fastqc base distribution
Figure 8: Fastqc N distribution
Find the genomic Location of origin for the sequencing read. Software: Bowtie2, TopHat, STAR, Subread/Rsubread, many others!
Figure 9: Sequence read alignment
Here is quick tutorial on sequnce aligment.
The following userguide will be helpful for you:
http://bioinf.wehi.edu.au/subread-package/SubreadUsersGuide.pdf
Abraham Lincoln: “Give me six hours to chop down a tree and I will spend the first four sharpening the axe.” (4 minutes indexing the genome, 2 minutes aligning the reads)
Note that you will rarely do this for human alignment. You will usually download an existing index given to you by others who have already done this work. You will do this often if you are aligning microbial reads, e.g. MTB or some other organism for which others have not already made your index for you.
buildindex(basename="genome/ucsc.hg19.chr1_120-150M",reference="genome/ucsc.hg19.chr1_120-150M.fasta.gz")
## WARNING: your system seems to be 32-bit. Rsubread supports 32-bit systems to a very limited level.
## It is highly recommended to run Rsubread on a 64-bit system to avoid errors.
##
##
## ========== _____ _ _ ____ _____ ______ _____
## ===== / ____| | | | _ \| __ \| ____| /\ | __ \
## ===== | (___ | | | | |_) | |__) | |__ / \ | | | |
## ==== \___ \| | | | _ <| _ /| __| / /\ \ | | | |
## ==== ____) | |__| | |_) | | \ \| |____ / ____ \| |__| |
## ========== |_____/ \____/|____/|_| \_\______/_/ \_\_____/
## Rsubread 2.12.3
##
## //================================= setting ==================================\\
## || ||
## || Index name : ucsc.hg19.chr1_120-150M ||
## || Index space : base space ||
## || Index split : no-split ||
## || Repeat threshold : 100 repeats ||
## || Gapped index : no ||
## || ||
## || Free / total memory : 8.6GB / 24.0GB ||
## || ||
## || Input files : 1 file in total ||
## || o ucsc.hg19.chr1_120-150M.fasta.gz ||
## || ||
## \\============================================================================//
##
## //================================= Running ==================================\\
## || ||
## || Check the integrity of provided reference sequences ... ||
## || There were 4800000 notes for reference sequences. ||
## || The notes can be found in the log file, 'genome/ucsc.hg19.chr1_120-150 ... ||
## || Scan uninformative subreads in reference sequences ... ||
## || 516 uninformative subreads were found. ||
## || These subreads were excluded from index building. ||
## || Estimate the index size... ||
## || 8%, 0 mins elapsed, rate=8996.1k bps/s ||
## || 16%, 0 mins elapsed, rate=13047.9k bps/s ||
## || 24%, 0 mins elapsed, rate=15352.6k bps/s ||
## || 33%, 0 mins elapsed, rate=16837.5k bps/s ||
## || 41%, 0 mins elapsed, rate=17873.2k bps/s ||
## || 49%, 0 mins elapsed, rate=18636.9k bps/s ||
## || 58%, 0 mins elapsed, rate=19216.5k bps/s ||
## || 66%, 0 mins elapsed, rate=19685.0k bps/s ||
## || 74%, 0 mins elapsed, rate=20060.2k bps/s ||
## || 83%, 0 mins elapsed, rate=18924.4k bps/s ||
## || 91%, 0 mins elapsed, rate=17772.8k bps/s ||
## || 3.0 GB of memory is needed for index building. ||
## || Build the index... ||
## || 8%, 0 mins elapsed, rate=2073.9k bps/s ||
## || 16%, 0 mins elapsed, rate=3399.5k bps/s ||
## || 24%, 0 mins elapsed, rate=4318.1k bps/s ||
## || 33%, 0 mins elapsed, rate=4990.7k bps/s ||
## || 41%, 0 mins elapsed, rate=5509.0k bps/s ||
## || 49%, 0 mins elapsed, rate=5917.0k bps/s ||
## || 58%, 0 mins elapsed, rate=6247.5k bps/s ||
## || 66%, 0 mins elapsed, rate=6521.4k bps/s ||
## || 74%, 0 mins elapsed, rate=6751.2k bps/s ||
## || 83%, 0 mins elapsed, rate=6075.1k bps/s ||
## || 91%, 0 mins elapsed, rate=5500.2k bps/s ||
## || Save current index block... ||
## || [ 0.0% finished ] ||
## || [ 10.0% finished ] ||
## || [ 20.0% finished ] ||
## || [ 30.0% finished ] ||
## || [ 40.0% finished ] ||
## || [ 50.0% finished ] ||
## || [ 60.0% finished ] ||
## || [ 70.0% finished ] ||
## || [ 80.0% finished ] ||
## || [ 90.0% finished ] ||
## || [ 100.0% finished ] ||
## || ||
## || Total running time: 0.4 minutes. ||
## || Index genome/ucsc.hg19.chr1_120-150M was successfully built. ||
## || ||
## \\============================================================================//
Note that this outputs results in a .bam file and not a .sam file
align(index="genome/ucsc.hg19.chr1_120-150M",readfile1="reads/R01_10_short500K.fq.gz",output_file="alignments/R01_10_short.bam", nthreads=4)
## WARNING: your system seems to be 32-bit. Rsubread supports 32-bit systems to a very limited level.
## It is highly recommended to run Rsubread on a 64-bit system to avoid errors.
##
##
## ========== _____ _ _ ____ _____ ______ _____
## ===== / ____| | | | _ \| __ \| ____| /\ | __ \
## ===== | (___ | | | | |_) | |__) | |__ / \ | | | |
## ==== \___ \| | | | _ <| _ /| __| / /\ \ | | | |
## ==== ____) | |__| | |_) | | \ \| |____ / ____ \| |__| |
## ========== |_____/ \____/|____/|_| \_\______/_/ \_\_____/
## Rsubread 2.12.3
##
## //================================= setting ==================================\\
## || ||
## || Function : Read alignment (RNA-Seq) ||
## || Input file : R01_10_short500K.fq.gz ||
## || Output file : R01_10_short.bam (BAM) ||
## || Index name : ucsc.hg19.chr1_120-150M ||
## || ||
## || ------------------------------------ ||
## || ||
## || Threads : 4 ||
## || Phred offset : 33 ||
## || Min votes : 3 / 10 ||
## || Max mismatches : 3 ||
## || Max indel length : 5 ||
## || Report multi-mapping reads : yes ||
## || Max alignments per multi-mapping read : 1 ||
## || ||
## \\============================================================================//
##
## //================= Running (21-Jun-2023 15:02:44, pid=6394) =================\\
## || ||
## || Check the input reads. ||
## || The input file contains base space reads. ||
## || Initialise the memory objects. ||
## || Estimate the mean read length. ||
## || The range of Phred scores observed in the data is [2,37] ||
## || Create the output BAM file. ||
## || Check the index. ||
## || Init the voting space. ||
## || Global environment is initialised. ||
## || Load the 1-th index block... ||
## || The index block has been loaded. ||
## || Start read mapping in chunk. ||
## || 5% completed, 3.2 mins elapsed, rate=93.8k reads per second ||
## || 12% completed, 3.2 mins elapsed, rate=113.7k reads per second ||
## || 18% completed, 3.2 mins elapsed, rate=124.2k reads per second ||
## || 25% completed, 3.2 mins elapsed, rate=125.8k reads per second ||
## || 31% completed, 3.2 mins elapsed, rate=125.4k reads per second ||
## || 38% completed, 3.2 mins elapsed, rate=126.1k reads per second ||
## || 45% completed, 3.2 mins elapsed, rate=126.5k reads per second ||
## || 52% completed, 3.2 mins elapsed, rate=127.8k reads per second ||
## || 58% completed, 3.2 mins elapsed, rate=128.3k reads per second ||
## || 65% completed, 3.2 mins elapsed, rate=128.6k reads per second ||
## || 69% completed, 3.2 mins elapsed, rate=1.8k reads per second ||
## || 73% completed, 3.2 mins elapsed, rate=1.9k reads per second ||
## || 76% completed, 3.2 mins elapsed, rate=2.0k reads per second ||
## || 80% completed, 3.2 mins elapsed, rate=2.1k reads per second ||
## || 83% completed, 3.2 mins elapsed, rate=2.2k reads per second ||
## || 86% completed, 3.2 mins elapsed, rate=2.3k reads per second ||
## || 90% completed, 3.2 mins elapsed, rate=2.3k reads per second ||
## || 93% completed, 3.2 mins elapsed, rate=2.4k reads per second ||
## || 96% completed, 3.2 mins elapsed, rate=2.5k reads per second ||
## || 99% completed, 3.2 mins elapsed, rate=2.6k reads per second ||
## || ||
## || Completed successfully. ||
## || ||
## \\==================================== ====================================//
##
## //================================ Summary =================================\\
## || ||
## || Total reads : 503,568 ||
## || Mapped : 376,964 (74.9%) ||
## || Uniquely mapped : 317,555 ||
## || Multi-mapping : 59,409 ||
## || ||
## || Unmapped : 126,604 ||
## || ||
## || Indels : 2,236 ||
## || ||
## || Running time : 3.2 minutes ||
## || ||
## \\============================================================================//
## R01_10_short.bam
## Total_reads 503568
## Mapped_reads 376964
## Uniquely_mapped_reads 317555
## Multi_mapping_reads 59409
## Unmapped_reads 126604
## Indels 2236
Note that Rsubread outputs a .bam file (bam = binary alignment map) and not a .sam file (sam = sequence alignment map). Here is some information about a .sam file:
https://en.wikipedia.org/wiki/SAM_(file_format)
Figure 10: SAM and BAM file format
https://samtools.github.io/hts-specs/SAMv1.pdf
To convert .sam to .bam or vice versa, a package called Rsamtools. Using Rsamtools, you can convert bam to sam as follows:
asSam("alignments/R01_10_short.bam", overwrite=T)
## [1] "alignments/R01_10_short.sam"
# To convert to bam:
#asBam("alignments/R01_10_short.bam")
Makes a system call to the Mac terminal to generate a .sam file
Now we can count reads hitting genes. Approaches/software:
Figure 11: Feature Counts
fCountsList = featureCounts("alignments/R01_10_short.bam", annot.ext="genome/genes.chr1_120-150M.gtf", isGTFAnnotationFile=TRUE)
## Warning: your system seems to be 32-bit. Rsubread supports 32-bit systems to a limited level only.
## We recommend that Rsubread be run on 64-bit systems to avoid any possible problems.
##
## ========== _____ _ _ ____ _____ ______ _____
## ===== / ____| | | | _ \| __ \| ____| /\ | __ \
## ===== | (___ | | | | |_) | |__) | |__ / \ | | | |
## ==== \___ \| | | | _ <| _ /| __| / /\ \ | | | |
## ==== ____) | |__| | |_) | | \ \| |____ / ____ \| |__| |
## ========== |_____/ \____/|____/|_| \_\______/_/ \_\_____/
## Rsubread 2.12.3
##
## //========================== featureCounts setting ===========================\\
## || ||
## || Input files : 1 BAM file ||
## || ||
## || R01_10_short.bam ||
## || ||
## || Paired-end : no ||
## || Count read pairs : no ||
## || Annotation : genes.chr1_120-150M.gtf (GTF) ||
## || Dir for temp files : . ||
## || Threads : 1 ||
## || Level : meta-feature level ||
## || Multimapping reads : counted ||
## || Multi-overlapping reads : not counted ||
## || Min overlapping bases : 1 ||
## || ||
## \\============================================================================//
##
## //================================= Running ==================================\\
## || ||
## || Load annotation file genes.chr1_120-150M.gtf ... ||
## || Features : 1918 ||
## || Meta-features : 104 ||
## || Chromosomes/contigs : 1 ||
## || ||
## || Process BAM file R01_10_short.bam... ||
## || Single-end reads are included. ||
## || Total alignments : 503568 ||
## || Successfully assigned alignments : 4561 (0.9%) ||
## || Running time : 0.01 minutes ||
## || ||
## || Write the final count table. ||
## || Write the read assignment summary. ||
## || ||
## \\============================================================================//
featureCounts = cbind(fCountsList$annotation[,1], fCountsList$counts)
write.table(featureCounts, "alignments/R01_10_short.features.txt", sep="\t", col.names=FALSE, row.names=FALSE, quote=FALSE)
Figure 12: Feature Counts
Figure 13: Feature Counts
#install.packages("devtools")
#devtools::install_github("wevanjohnson/singleCellTK")
library(singleCellTK)
singleCellTK()
### open features_combined.txt
### and meta_data.txt
Batch Effect: Non-biological variation due to differences in batches of data that confound the relationships between covariates of interest.
Batch effects are caused by differences in:
Batch Effect Examples:
Example 1 resulted from an oligonucleotide microarray (Affymetrix HG-U133A) experiment on human lung fibroblast cells (IMR90) designed to reveal whether exposing mammalian cells to nitric oxide (NO) stabilizes mRNAs. Control samples and samples exposed to NO for 1 h were then transcription inhibited for 7.5 h. Microarray data were collected at baseline (0 h, just before transcription inhibition) and at the end of the experiment (after 7.5 h) for both the control and the NO-treated group. It was hypothesized that NO will induce or inhibit the expression of some genes, but would also stabilize the mRNA of many genes, preventing them from being degraded after 7.5 h.
Figure 14
Figure 15
Single peptide predictors of disease (AUC): 0.82, 0.76, 0.74, 0.74, 0.70 (+12 more >0.6)
Single peptide predictors of batch (AUC): 0.99, 0.94, 0.91, 0.86, 0.86, 0.84, 0.84, 0.84, 0.83, 0.82 (+7 more >0.6)
Predict batch better than disease!
Need to normalize data because of:
Figure 18
Alignment and feature counting result in discrete count data (i.e. the number of reads to each gene). A first thought might be to use a Poisson distribution to model the counts. However, the Poisson makes a strict mean-variance assumption (i.e. they are the same. Studies have demonstrated that a negative binomial fits data better.
Figure 19
A data structure is a particular way of organizing data in a computer so that it can be used effectively. The idea is to reduce the space and time complexities of different tasks.
Data structures in R programming are tools for holding multiple values, variables, and sometimes functions
Please think very carefully about the way you manage and store your data! This can make your life much easier and make your code and data cleaner and more portable!
There are advanced R data structures, S3 and
S4 class objects, that can facilitate object orientated
programming. One useful example of an S4 class data structure is the
SummarizedExperiment object.
counts <- read.table("downstream_analysis/features_combined.txt", sep="\t",
header=T, row.names=1)
meta_data <- read.table("downstream_analysis/meta_data.txt", sep="\t",
header=T, row.names=1)
group <- meta_data$Disease
sce_hivtb <- SummarizedExperiment(assays=list(counts=counts),
colData = meta_data)
sce_hivtb <- mkAssay(sce_hivtb, log = TRUE, counts_to_CPM = TRUE)
assays(sce_hivtb)
## List of length 4
## names(4): counts log_counts counts_cpm log_counts_cpm
set.seed(1)
umap_out <- umap(t(assay(sce_hivtb,"log_counts_cpm")))
umap_plot <- as.data.frame(umap_out$layout)
umap_plot$Disease <- as.factor(sce_hivtb$Disease)
g <- ggplot(umap_plot,
aes(x=V1, y=V2, color=Disease)) +
geom_point(size=1.5) + xlab("UMAP1") + ylab("UMAP1") +
theme(plot.title = element_text(hjust = 0.5)) +
ggtitle("UMAP Plot")
plot(g)
Figure 21
Implements statistical methods for DE analysis based on the negative binomial model:
counts<-counts[which(rowSums(cpm(counts))>1),] #Gene Filtering
dge <- DGEList(counts=counts, group=group) #Computes library size
dge <- calcNormFactors(dge) #TMM normalization
design<-model.matrix(~group)
dge<-estimateDisp(counts,design) #Estimates common, trended and tagwise dispersion
In negative binomial models, each gene is given a dispersion parameter. Dispersions control the variances of the gene counts and underestimation will lead to false discovery and overestimation may lead to a lower rate of true discovery
#perform likelihood ratio tests
fit<-glmFit(counts,design, dispersion=dge$tagwise.dispersion) #fits a negative binomial GLM with the dispersion estimates
lrt<-glmLRT(fit, coef=2) #Performs likelihood ratio test, comparing the goodness of the fit of the full versus reduced model
topTags(lrt)
## Coefficient: grouptb_hiv
## logFC logCPM LR PValue FDR
## IL1R2 4.334196 8.207931 100.01344 1.513665e-23 2.933634e-19
## AP3B2 5.758193 2.952770 71.58586 2.654527e-17 2.572370e-13
## FCGR1C 2.818498 4.536149 65.07230 7.220003e-16 4.664362e-12
## VNN1 3.150158 8.071776 64.39089 1.020287e-15 4.943545e-12
## CYP1B1 3.135471 6.873000 63.00698 2.059751e-15 7.984006e-12
## IL18R1 2.726131 6.487474 60.75863 6.451968e-15 2.084093e-11
## SLC29A1 -4.135274 3.969739 59.69166 1.109455e-14 3.071764e-11
## CACNG8 -4.134373 3.189823 58.97316 1.598376e-14 3.872266e-11
## SOCS3 2.665230 6.475922 57.71505 3.029759e-14 6.524418e-11
## ZAK 2.601746 6.126750 56.08435 6.942749e-14 1.345574e-10
#perform quasi-likelihood F-tests
#Replace the chisquare approximation to the likelihood ratio statistic with a quasi-likelihood F-test, more control of error rate
fit<-glmQLFit(counts, design, dispersion=dge$tagwise.dispersion) #use for small dataset, reflects uncertainty in estimating dispersion for each gene, more robust and reliable error rate #control when the number of replicates is small
qlf<-glmQLFTest(fit, coef=2)
topTags(qlf)
## Coefficient: grouptb_hiv
## logFC logCPM F PValue FDR
## IL1R2 4.334196 8.207931 98.25774 3.688811e-23 7.149284e-19
## AP3B2 5.758193 2.952770 68.34333 1.376533e-16 1.333930e-12
## FCGR1C 2.818498 4.536149 63.94549 1.281415e-15 8.278371e-12
## VNN1 3.150158 8.071776 63.35564 1.728679e-15 8.375881e-12
## CYP1B1 3.135471 6.873000 62.52984 2.628931e-15 1.019026e-11
## IL18R1 2.726131 6.487474 60.35328 7.940072e-15 2.564775e-11
## SLC29A1 -4.135274 3.969739 58.17236 2.404908e-14 6.658504e-11
## SOCS3 2.665230 6.475922 57.33072 3.688947e-14 8.936935e-11
## CACNG8 -4.134373 3.189823 56.59826 5.353531e-14 1.152853e-10
## ZAK 2.601746 6.126750 55.70915 8.414232e-14 1.630762e-10
#For visualization, heatmaps/PCA
Logcpm<-cpm(counts,log=TRUE)
dds <- DESeqDataSetFromMatrix(countData = counts, colData=meta_data, design=~Disease)
## Warning in DESeqDataSet(se, design = design, ignoreRank): some variables in
## design formula are characters, converting to factors
#colData is a data frame of demographic/phenotypic data
dds<-dds[rowSums(counts(dds))>1,] #Gene Filtering
dds<-DESeq(dds) #Performs estimation of size factors,dispersion, and negative binomial GLM f#itting
## estimating size factors
## estimating dispersions
## gene-wise dispersion estimates
## mean-dispersion relationship
## final dispersion estimates
## fitting model and testing
## -- replacing outliers and refitting for 180 genes
## -- DESeq argument 'minReplicatesForReplace' = 7
## -- original counts are preserved in counts(dds)
## estimating dispersions
## fitting model and testing
res <- results(dds)[order(results(dds)[,6]),]
res[1:10,]
## log2 fold change (MLE): Disease tb hiv art vs hiv only
## Wald test p-value: Disease tb hiv art vs hiv only
## DataFrame with 10 rows and 6 columns
## baseMean log2FoldChange lfcSE stat pvalue padj
## <numeric> <numeric> <numeric> <numeric> <numeric> <numeric>
## ZEB2 1702.8164 1.75543 0.217662 8.06493 7.32802e-16 1.19996e-11
## DCUN1D3 48.7029 2.52495 0.361657 6.98163 2.91765e-12 2.38883e-08
## TIPARP 724.9429 2.22978 0.334020 6.67558 2.46254e-11 1.34414e-07
## ITPKC 196.8723 3.04144 0.465638 6.53177 6.49981e-11 2.66086e-07
## LAIR1 1528.6852 3.01392 0.472158 6.38328 1.73337e-10 4.40512e-07
## COPS4 559.7527 2.67612 0.419911 6.37306 1.85294e-10 4.40512e-07
## IGF2BP3 360.1066 3.29761 0.517782 6.36873 1.90604e-10 4.40512e-07
## GSAP 964.2702 1.26506 0.199220 6.35007 2.15212e-10 4.40512e-07
## RBMS1 3297.7791 2.51612 0.400420 6.28370 3.30604e-10 6.01516e-07
## ZDHHC20 554.9810 2.51989 0.403490 6.24523 4.23175e-10 6.92949e-07
# Make a Heatmap of DEGs
mat = as.matrix(assay(sce_hivtb,"log_counts_cpm"))[order(results(dds)[,6])[1:100],] # Using first 1000 genes to simplify
mat = t(scale(t(mat)))
df=data.frame(Disease=colData(sce_hivtb)$Disease)
ha = HeatmapAnnotation(df = df, col = list(Disease=c("tb_hiv"="Red","hiv_only"="Blue", "tb_hiv_art"="Green")))
Heatmap(mat,show_row_names=F,show_column_names = F, top_annotation = ha)
dge <- DGEList(counts=counts, group=group) #From edgeR, Computes library size
counts<-counts[which(rowSums(cpm(counts))>1),] #Gene Filtering
dge <- DGEList(counts=counts, group=group) #Re-compute library size
dge <- calcNormFactors(dge) #TMM normalization
design<-model.matrix(~group)
v<-voom(dge, design) #voom transform to calculate weights to eliminate mean-variance #relationship
#use usual limma pipelines
fit<-lmFit(v,design)
fit<-eBayes(fit)
topTable(fit, coef=ncol(design))
## logFC AveExpr t P.Value adj.P.Val B
## DCUN1D3 2.415048 1.1213257 7.597067 5.330234e-09 5.174623e-05 10.335712
## ZEB2 1.681109 6.4208329 7.439069 8.537472e-09 5.174623e-05 10.090462
## LINC01093 5.474634 0.8710200 7.414057 9.200542e-09 5.174623e-05 9.932749
## FAM151B 3.709561 1.2894208 7.364262 1.067978e-08 5.174623e-05 9.825077
## C7orf61 2.599925 2.7347797 7.206753 1.714091e-08 5.536801e-05 9.443534
## CYP19A1 6.863347 -0.9439843 7.234226 1.578062e-08 5.536801e-05 9.241797
## TIPARP 2.118756 5.0816090 6.625677 9.994987e-08 1.885771e-04 7.773090
## IGF2BP3 3.336976 3.6439397 6.618279 1.022347e-07 1.885771e-04 7.751390
## COL4A2-AS1 5.463005 -3.3142037 6.975753 3.444177e-08 9.535941e-05 7.614380
## LAYN 3.342889 -0.9931241 6.639498 9.581743e-08 1.885771e-04 7.409620
After finding DEGs, look for correlated genes/networks and enriched pathway sets in the gene set using:
Figure 22
Figure 23